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WE CLAIM: 

1. Method for generating facial animation values using a sequence of facial image 
frames and synchronously captured audio data of a speaking actor, comprising the steps for: 

providing a plurality of visual-facial-animation values based on tracking of facial features 
in the sequence of facial image frames of the speaking actor; 

providing a plurality of audio-facial-animation values based on visemes detected using 
the synchronously captured audio voice data of the speaking actor; and 

combining the plurality of visual facial animation values and the plurality of audio facial 
animation values to generate output facial animation values for use in facial animation. 

2. Method for generating facial animation values as defined in claim 1, wherein the 
output facial animation values associated with a mouth for a facial animation are based only on 
the respective mouth-associated values of the plurality of audio facial animation values. 

3. Method for generating facial animation values as defined in claim 1, wherein the 
output facial animation values associated with a mouth for a facial animation are based on a 
weighted average of the respective mouth-associated values of the plurality of visual facial 
animation values and the respective mouth-associated values of the plurality of audio facial 
animation values. 

4. Method for generating facial animation values as defined in claim 1, wherein the 
output facial animation values associated with a mouth for a facial animation are based on 
Kalman filtering of the respective mouth-associated values of the plurality of visual facial 
animation values and the respective mouth-associated values of the plurality of audio facial 
animation values. 

5. Method for generating facial animation values as defined in claim 1, wherein the step 
of combining the plurality of visual facial animation values and the plurality of audio facial 
animation values to generate output facial animation values includes detecting whether speech is 
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occurring in the synchronously captured audio voice data of the speaking actor and, while speech 
is detected as occurring, generating the output facial animation values associated with a mouth 
based only on the respective mouth-associated values of the plurality of audio facial animation 
values and, while speech is not detected as occurring, generating the output facial animation 
values associated with a mouth based only on the respective mouth-associated values of the 
plurality of visual facial animation values. 

6. Method for generating facial animation values as defined in claim 1, wherein the 
tracking of facial features in the sequence of facial image frames of the speaking actor is 
performed using bunch graph matching. 

7. Method for generating facial animation values as defined in claim 1, wherein the 
tracking of facial features in the sequence of facial image frames of the speaking actor is 
performed using transformed facial image frames generated based on wavelet transformations. 

8. Method for generating facial animation values as defined in claim 1, wherein the 
tracking of facial features in the sequence of facial image frames of the speaking actor is 
performed using transformed facial image frames generated based on Gabor wavelet 
transformations. 

9. Apparatus for generating facial animation values using a sequence of facial image 
frames and synchronously captured audio data of a speaking actor, comprising: 

means for providing a plurality of visual-facial-animation values based on tracking of 
facial features in the sequence of facial image frames of the speaking actor; 

means for providing a plurality of audio-facial-animation values based on visemes 
detected using the synchronously captured audio voice data of the speaking actor; and 

means for combining the plurality of visual facial animation values and the plurality of 
audio facial animation values to generate output facial animation values for use in facial 
animation. 
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10. Apparatus for generating facial animation values as defined in claim 9, wherein the 
output facial animation values associated with a mouth for a facial animation are based only on 
the respective mouth-associated values of the plurality of audio facial animation values. 

5 11. Apparatus for generating facial animation values as defined in claim 9, wherein the 

output facial animation values associated with a mouth for a facial animation are based on a 
weighted average of the respective mouth-associated values of the plurality of visual facial 
animation values and the respective mouth-associated values of the plurality of audio facial 
animation values. 

10 

12. Apparatus for generating facial animation values as defined in claim 9, wherein the 
43. output facial animation values associated with a mouth for a facial animation are based on 
y4 Kalman filtering of the respective mouth-associated values of the plurality of visual facial 
'fl animation values and the respective mouth- associated values of the plurality of audio facial 
Qrl 5 animation values. 

f 4 13. Apparatus for generating facial animation values as defined in claim 9 ? wherein the 

yj means for combining the plurality of visual facial animation values and the plurality of audio 
2 facial animation values to generate output facial animation values includes means for detecting 
20 whether speech is occurring in the synchronously captured audio voice data of the speaking actor 
and, while speech is detected as occurring, generating the output facial animation values 
associated with a mouth based only on the respective mouth-associated values of the plurality of 
audio facial animation values and, while speech is not detected as occurring, generating the 
output facial animation values associated with a mouth based only on the respective mouth- 
25 associated values of the plurality of visual facial animation values. 

14. Apparatus for generating facial animation values as defined in claim 9, wherein the 
tracking of facial features in the sequence of facial image frames of the speaking actor is 
performed using bunch graph matching. 

30 
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15. Apparatus for generating facial animation values as defined in claim 9, wherein the 
tracking of facial features in the sequence of facial image frames of the speaking actor is 
performed using transformed facial image frames generated based on wavelet transformations. 

16. Apparatus for generating facial animation values as defined in claim 9, wherein the 
tracking of facial features in the sequence of facial image frames of the speaking actor is 
performed using transformed facial image frames generated based on Gabor wavelet 
transformations. 
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